Value-Directed Sampling Methods for POMDPs

نویسندگان

  • Pascal Poupart
  • Luis E. Ortiz
  • Craig Boutilier
چکیده

We consider the problem of approximate belief-state monitoring using particle filtering for the purposes of implementing a policy for a partially observable Markov decision process (POMDP). While particle fil­ tering has become a widely used tool in AI for monitor­ ing dynamical systems, rather scant attention has been paid to their use in the context of decision making. As­ suming the existence of a value function, we derive er­ ror bounds on decision quality associated with filtering using importance sampling. We also describe an adap­ tive procedure that can be used to dynamically deter­ mine the number of samples required to meet specific error bounds. Empirical evidence is offered supporting this technique as a profitable means of directing sam­ pling effort where it is needed to distinguish policies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Point-Based Value Iteration for Continuous POMDPs

We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for model-based POMDPs are restricted to discrete states, actions, and observations, but many real-world problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value fu...

متن کامل

Value-Directed Sampling Methods for Monitoring POMDPs

We consider the problem of approximate belief-state monitoring using particle filtering for the purposes of implementing a policy for a partially observable Markov decision process (POMDP). While particle filtering has become a widely used tool in AI for monitoring dynamical systems, rather scant attention has been paid to their use in the context of decision making. Assuming the existence of a...

متن کامل

Monte Carlo POMDPs

We present a Monte Carlo algorithm for learning to act in partially observable Markov decision processes (POMDPs) with real-valued state and action spaces. Our approach uses importance sampling for representing beliefs, and Monte Carlo approximation for belief propagation. A reinforcement learning algorithm, value iteration, is employed to learn value functions over belief states. Finally, a sa...

متن کامل

Vector-space Analysis of Belief-state Approximation for POMDPs

We propose a new approach to value-directed belief state approximation for POMDPs. The valuedirected model allows one to choose approximation methods for belief state monitoring that have a small impact on decision quality. Using a vector space analysis of the problem, we devise two new search procedures for selecting an approximation scheme that have much better computational properties than e...

متن کامل

Value-Directed Belief State Approximation for POMDPs

We consider the problem belief-state monitoring for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP), specifically how one might approximate the belief state. Other schemes for beliefstate approximation (e.g., based on minimizing a measure such as KL-divergence between the true and estimated state) are not necessarily appropriate for POMDPs. Inste...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001